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Abstract — We study optimal transmission strategies in inter- 
fering wireless networks, under Quality of Service constraints. A 
buffered, dynamic network with multiple sources is considered, 
and sources use a retransmission strategy in order to improve 
packet delivery probability. The optimization problem is for- 
mulated as a Markov Decision Process, where constraints and 
objective functions are ratios of time-averaged cost functions. The 
optimal strategy is found as the solution of a Linear Fractional 
Program, where the optimization variables are the steady-state 
probability of state-action pairs. Numerical results illustrate the 
dependence of optimal transmission/interference strategies on the 
constraints imposed on the network. 

I. Introduction 

Retransmission-based error control techniques have been 
widely employed to improve reliability to communications 
against the impairments of the wireless channel [l]-[3]. In 
time-varying channels, the transmission of multiple copies of 
a packet can provide diversity and improve the Quality of 
Service (QoS) of the link. Implementations of retransmission- 
based error control techniques range from pure Automatic Re- 
transmission reQuest (ARQ), where packets are sent uncoded 
over the channel, to hybrid ARQ, which introduces packet 
encoding and memory of previous transmissions [4]-[6]. 

ARQ techniques have been mostly studied in single link 
scenarios [l]-[6]. This paper studies ARQ in interference 
networks, where multiple sources may access the same time- 
frequency resource. Mutual interference couples the behavior 
and effectiveness of link level ARQ protocols. This, in turn, 
couples the stochastic evolution of the content of each link's 
buffer. For example, two links simultaneously transmitting can 
adversely effect the packet error probability of each link and 
thus through the ARQ protocol, the contents of each link's 
buffer. 

The coupling between interference and ARQ process has 
been studied in cognitive networks, where the ARQ protocol 
of the primary sources is fixed [7], [8]. In this paper, we 
instead center the discussion on the optimization of multiple 
and mutually inter-dependent retransmission processes with 
QoS constraints. 

We consider a network of multiple sources with packet 
arrival, buffering and memoryless retransmission-based error 
control. The network is modeled as a collection of inter- 
dependent stochastic processes. A constrained infinite-horizon 
Markov Decision Process (MDP) is formulated in order to 
optimize the transmission/interference strategy of the sources. 
Performance metrics such as packet delivery probability, av- 
erage throughput, total packet delay and unit of energy spent 
per unit of throughput are the objective/constraint functions of 
the optimization problem. 



The MDP is solved through a linear fractional program, 
where the optimization variables are the steady-state probabil- 
ity of state-action pairs. Optimizing the ratio of time averaged 
cost functions yields optimal retransmission strategies. In fact, 
the formalization as a linear fractional program enables an easy 
incorporation in the optimization problem of individual packet 
performance and relevant tradeoffs (e.g., energy per unit of 
throughput, average delay over failure probability), which are 
obtained as ratio of time-averages of cost functions defined 
on the state-action space. To the best of our knowledge, this 
is the first formalization of an MDP problem incorporating 
these objective/constraints. Interestingly, the proposed frame- 
work finds connections with optimization frameworks used to 
minimize the cost per unit of time in controlled semi-Markov 
processes [9], [10]. 

The observation of the optimal transmission/interference 
strategies enables the understanding of objective/constraints 
related behaviors, which may serve as guidelines for practical 
protocols. Numerical results are provided for a network with 
two sources with the goal of minimizing the aggregate average 
energy per unit of throughput with constraints on individual 
source's throughput, individual packet total delay and failure 
probability. 

The remainder of the paper is organized as follows. In 
Section HI] the considered network is described. Section [Til] 
defines the stochastic model of the network, the performance 
metrics and the optimization problem. In Section [IV] the 
linear fractional program used to solve the constrained infinite- 
horizon MDP addressed in this paper is described in detail. 
Section [V] provides a renewal interpretation of some perfor- 
mance metrics. Section[VT]investigates the optimal strategy for 
an instantiation of the network. 

II. Network Description 

A single-hop network of S sources is considered. Each 
source s=l,...,S stores packets to be delivered to its in- 
tended destination in a finite First-In First-Out buffer of size 
B packets. 

Sources adopt a memoryless ARQ retransmission-strategy 
in order to improve packet delivery probability. Therefore, 
prior transmissions of the wanted packet are discarded at 
the receiver. More refined retransmission protocols providing 
combination of packets referring to the same information 
content, such as type-II hybrid ARQ, can be incorporated in 
the model at the price of a larger state space of the stochastic 
model. 

We fix a maximum time interval for packet service. The 
transmission/interference strategy, then, defines packet retrans- 



V{Y k =y k \X k =Xk, ...,X =x , Y t - 1 =y t -x, ■•■,i / b=2/o, U k =u k , U =u )=V(Y k =y k \X k =x k7 U k =u k )=P(y k \x k ,u k ). (2) 



mission within this interval. The service interval of a packet 
is defined as the time elapsed since it became the oldest in the 
queue and the time it is removed form the buffer. Time slotted 
operations are assumed, where the duration of the transmission 
of a packet plus its associated ARQ feedback fits with the 
duration of one time slot. The maximum service time, then, is 
fixed to F slots, which corresponds to the maximum number of 
transmissions of a packet. A packet is removed from the buffer 
either if successfully delivered to the intended destination or 
has been in service for F slots. 

Packet arrival in the buffer of each individual source is 
modeled through the variable a s , denoting the probability 
that a new packet arrives in the buffer of source s in a slot. 
This simple model is used to enable the obtaining of a clear 
relationship between the transmission/interference strategy and 
the queue/service time state of the network. More involved 
packet arrival processes (e.g., Markovian arrivals) can be easily 
incorporated into the framework. 

The sources' transmission/interference strategy is the solu- 
tion of an offline optimization problem that maximizes a per- 
formance metric subject to QoS constraints. In particular, the 
optimization problem is formalized as a constrained infinite- 
horizon undiscounted MDP, where the optimal policy controls 
packet transmission and dropping at each individual source. 

The next section describes, in detail, the stochastic model of 
the network, the selected performance metrics and the MDP. 
The formulation of the linear fractional program used to solve 
the optimization problem is provided in Section [IV] 

III. Stochastic Model and Performance Metrics 

The network is modeled as collection of random processes 
and control sequences tracking individual sources' state (queue 
length and service time) and actions (packet transmission 
and dropping from the buffer). Interference ties together the 
stochastic processes of the individual sources. In fact, the 
success probability of a source's transmission depends on the 
set of sources which transmit in the time slot. Therefore, 
other sources' activity determines the probability that a packet 
is removed from the queue due to successful delivery or 
experiences continued service because of a failed transmission. 
Moreover, in the case considered, the optimal policy is a 
randomized stationary policy (see Section UVt . in which the 
probability that an action is chosen is a function of the overall 
state of the network. 

In order to characterize the performance of the aggregate 
network and of the individual sources, a set of cost functions 
mapping the state-action space to a real cost is defined. The 
performance metrics, are in turn defined as ratios of time- 
averages of those cost functions. As explained later in this 
section, this construction enables the formalization of indi- 
vidual packet and individual source performance, as well as 
relevant tradeoffs, required to accurately track the performance 
of the retransmission and channel access strategy. 



A. Stochastic Model of the Network 

Consider the homogeneous random processes X = 
{X ,Xx,X 2 ,...} and Y = [Y , Y u Y 2 , ...}, where X k and 
Y k , k = 0,1,2,..., take values in the finite state spaces 
X and y, respectively. We also define the control se- 
quence U={Uo, Ux, U2, ...}, where the control variables U k , 
k=0, 1,2,.. ., take values in the finite action set U. Process 
X models the correlated temporal evolution of the network 
given the control sequence U, whereas process Y represents 
a sequence of random outcomes of state-action pairs. 

The transition probabilities of X are denoted by 

P{x k+ i\x kl y k ,u k ) = V(Xk+x=Xk+i\Xk=Xk,Yk=yk, U k =u k ), 

(1) 

where V(-) denotes the probability of an event. The probabil- 
ity that Yfe takes a certain value y does not depend on the past 
history of X and Y, but only on the action variable u and on 
the current state of process X k (see Eq. (O). The probability 
that X moves from state x k to x k +i conditioned on action 
u k is 

P(x k+1 \x k ,u k ) = ^ P(x k +i\x k ,y k ,u k )P(y k \x k ,u k ). 
y k ey 

(3) 

The process X tracks the state of the sources in terms 
of queue length and service time. In particular, the state 
of X at time t is decomposed into S variables X k (s\, 
8=1,. ..S, with X k (s)GX(s)=OU{l,...,F}x{l,...,B}U 
X k (s)=0 means that source s has an empty buffer, 
whereas X k (s)={b k (s), f k (s)}, with f k (s)=l, F and 
b k (s)=l, B, means that source s has b k (s) packets in its 
buffer and the packet currently under service has been served 
for fk(s) slots. 

The policy /1 controls sources' access and packet dropping 
from the buffer. Assuming causal control, [i is a function 
of the past states of the processes and control variables, 
i.e., u k =n(x ,...,x k -i,y ,...,y k ~i,u ,—,u k -i). However, 
for the optimization problem formalized in the following, 
there exists an optimal randomized stationary policy [11]. 
The control variable U k can be split into individual source 
variables U k {s), s=l,...,S, determining source s's transmis- 
sion and packet dropping in the time slot t@ In particular, 
U k (s)=(T k (s),D k (s)), where T k (s)=l and T k (s)=Q corre- 
spond to transmission and idleness in slot k, respectively, and 
D k (s)—1 and D k (s)=0 correspond to packet dropping and 
permanence in the buffer of the packet currently being served. 
Note that if X k (s)—0, i.e., source s has an empty buffer, 
then Tfc(s) and D k (s) are forced to zero. Moreover, D k (s) 
is forced to one if X k (s) = (b k (s), F), as the packet currently 
under service is always dropped after F slots. 

'We recall that B is the size of the buffer and F is the maximum service 
time 

2 Power control, and in general any transmission parameter, can be included 
in the model by extending the set U. 



The random process Y tracks the success/failure of all 
the sources of the network. In particular, Y k takes values in 
^={0, 1} S . Again, the variable y&y is decomposed into mul- 
tiple variables y(s)s{0, 1}. y(s)=0 and y(s)—l corresponds 
to failure and success of source s's transmission, respectively. 
The success probability of source s in state x given that action 
u is chosen is denoted by p s (x,u)=P(y k (s)=l\x k ,u k ). If 
T(s)=0, i.e., source s is idle in slot k, then p s (x,u)=0. 

B. Performance Metrics and Optimization Problem 

Much of prior work on optimization of transmission 
scheduling focused on performance metrics such as through- 
put [12]— [14]. Alternatively, packet delay can be constrained 
using Lyapunov functions [15]. In order to characterize 
the performance of individual source and individual packet 
transmission, we propose the construction of specific objec- 
tive/constraints functions defined as ratios of time-averages of 
cost functions. 

In particular, we define the set of cost functions z a : X x 
y x U — >• M, a=l, . . . , A, which assign to the triple (x, y, u) 
a finite cost z a (x,y,u), for any x€X, y€Y and uQU. The 
time-average of the cost function z a is defined as 



1 



a (U)= lim sup -Y^E[z a (X k ,Y k ,U k )]. 



(4) 



fc=i 



a=l, . . . , A, where E[-} is the expectation operator. 

The objective and constraint functions are defined as ratios 
of time-averages of cost functions. 

R(U) = z rn (U)/z rd (U) (5) 
C q (U)=/3 q z aAq) (U)/z ad{q) (U)+\ q , (6) 

respectively, where X q and f3 q are constants in R, and r n , rd, 
a n (q) and a^{q) are indexes in 1, A. 

The optimization problem is the determination of the se- 
quence U that minimizes the objective function R(U) over 
all the control sequences in U°° subject to M c constraints on 
the functions z q (U). Formally, 



U =arg inf R{U) 
s.t. C q (U) < 7 , 



ueu° 

for q=l,2,...,M c 



(7) 



The above optimization problem represents a constrained 
infinite-horizon Markov Decision Process (MDP). 

Assuming X is unichain, i.e., the transition matrix for any 
stationary deterministic policy has a single recurrent class plus 
a (perhaps empty) set of transient states [16], then there exists 
an optimal stationary randomized policy solving the above 
optimization problem [11]. Moreover, the optimal policy has 
at most M c randomizations, i.e., states in which the policy is 
non-deterministic [11]. 

In the following, the performance metrics used to charac- 
terize the performance of the network are listed. The average 
normalized throughput of the source s is the time-average of 
the cost function 



zi(x k ,y k ,u k ) 



p s (x k ,u k ) if u k (s) :t k (s) = \, 







otherwise, 



(8) 



where x k (s) — (t k (s),d k (s)). Similarly, the average normal- 
ized energy expense of source s is the time average of the cost 
function 

z 2 {x k ,y k ,u k ) = t k (s). (9) 

Note that the aggregate normalized throughput and energy 
expense can be obtained as sum of the individual source 
throughput and energy expense. The ratio ~zil~z\ measures the 
efficiency of source's s transmission/interference strategy in 
terms of unit of energy spent per unit of delivered traffic. 

Individual packet performance metrics such as packet suc- 
cess probability, number of transmissions and total delay 
can be obtained as ratios of time-averages of apposite cost 
functions. In the first two metrics, the number of delivered 
packets or overall transmissions needs to be normalized to 
the number of effectively served packets, which is function 
of the policy. The average total delay, i.e., the average time a 
packet spends in the buffer, is computed as the ratio between 
the average queue level and the average number of packet 
arrivals. 

The fraction of slots in which source s successfully delivers 
a packet to the intended destination is I4, where 



Z3{x k ,y k ,u k ) 



if Vk(s) = 1, 
otherwise. 



(10) 



and p s (x, w)=l if u : t(s)=0, i.e., source s is idle. The time 
average Z3 of the cost function 



Z4,(x k ,y kl u k ) = 



if /*(*)= 1, 
otherwise. 



(11) 



measures the fraction of time in which source s starts the 
service of a new packet. The ratio Z3/Z4 corresponds to the 
average number of packets successfully delivered by source s 
normalized to the number of served packets, i.e., the success 
probability of source s's packets. In fact, 



£3 _ lim„^ +00 supn^lLi E[z 3 (X k ,Y k , U k )] 
z 4 lim n -H-oo su P n 2fe=i E[zi(X k ,Y k ,U k y] 



(12) 



Note that the cost functions Z4 and z§ are indicator functions 
of subsets of the state-action space of the network. In this 
case, the associated time-averages correspond to a probability 
measure. In particular, the time average of a cost function 
sampling the occurrence of a subset of the state-action space 
is the steady-state probability of the subset. As discussed in 
detail in Section [V] in this case, the ratio of time-averages 
assumes a particular meaning connected to renewal theory. 
The average number of transmissions of a packet of source s 

is Z2/"Z4- 

According to Little's law [17], the total delay of a packet of 
source s, defined as the average time a packet spends in the 
buffer of source s, can be measured as the ratio ~ze/z7, where 



ZQ{x kl y k ,u k ) = b k {s), 



and 



Z7{x k ,y k ,u k ) 



if x k (s) : b k (s) < B, 
if x k {s) : b k (s) = B. 



(13) 



(14) 



In fact, ~zq/~zi corresponds to the ratio of the average queue 
length and the average number of packets arrived in the buffer 
of source s. 

IV. Optimization Framework 

The optimal policy is a stationary randomized policy fi : 
XxU— >-[0, 1], where fi(x,u) indicates the probability that 
action u^lA is selected in state xGX. Given the policy \x, it is 
possible to define the transition kernel 

P^(x k+1 \x k )= 53 P{x k+ i\x k ,y k ,u k )P(y k \x k ,u k )^(x k ,u k ) 
y k ey,ueu 

(15) 

\/x k ,x k +iEX, which denotes the probability that X moves 
from state x k to state x k+ i under policy /z. 

Since X is unichain, then for any fi as defined above 
the limiting distribution tv^x) = Urn 



problem can be restated as a Linear Program optimizing over 
the admissible polyhedron of the steady-state distribution of 
the state-action pairs [11]. 

Define the optimization variable ui XyU as the probability that 
the process X is in state x and action u is chosen. The reward 
function and the constraint functions C q (/i) can then be 
expressed as 



^2xex ^2ueU Z r n { x i u )u x ,u 
^2xex ^2ueu z r<i( x i u ) z i,u 



t— s-+oo 



P^(x\x') exists 
\/x\x£X [16], where P^(x\x') is the i-step transition proba- 
bility from state x' to state a;0 We remark that since the limit 
lim t _j. +00 P^(x|x') converges to tt^(x), then [18] 

_^ m— 1 

7r M (aO= Urn - ^ P*(x\x') 

k=Q 
1 rn — 1 

lim - y^E tl [l(X k =x\X =x')], (16) 



m— >+oo m 

k=Q 

where E^-] and 1 (•) are the expectation operator, conditioned 
on policy /i, and the indicator function. Thus, tt^(x) is the 
average fraction of time spent by X is state a;Q 

The average cost collected by the network in state x 
associated with action u is 



z a {x : U 

)= 51 z a,{x,y,u)P(y\x,u) 

yey 



(17) 



Therefore, the average cost collected by the network in state 
x under policy /i is 



z a (x, 53 z a (x, u)n{x, u), o=l, ...A. 



(18) 



ueu 



The time averages in Eq. (HI can then be rewritten as the 
following linear combinational 



z a{p) = 53 n^(x)z a (x,fi), a=l, ...A. 



xEX 



The optimization problem (|7]i becomes 



fl = arg inf 



(19) 



(20) 



, R Ete* Maputo (s.M) i \ ^ -, 

S.t. fiq^^ 7 \ —, r+X q <J q ,q=l,...,M c , 

Y,xex 1T A x ) z a d ( q ){ x ^) 

where /I denotes the optimal stationary policy. Since U is fi- 
nite, and the limiting distribution exists, the above optimization 

3 The t-step transition probabilities can be inductively found from 
P M (z|0 [16]. 

4 We underline that, under the hypothesis that all the chain is unichain, the 
limiting distribution of the chain is independent of the initial state. 

5 In the following notation, the action sequence U is substituted with the 
function fi. 



ExexEueU Z a n (q)( x ,u)u 



J2xEX E u 



eu 



: ; 

Z a d (q){ x ,U)UJ x . u 



or equivalently 



C q {n) = P q z a x n{q) u>/z ad 



q> 



(21) 
(22) 



(23) 
(24) 



where T denotes the transpose operator, and z a = 
[z a {x,u)] ieXtUe u andu; = [uj x . u ] ieX . ueU are \XxU\ column 
vectors listing the costs and the steady-state probabilities 
associated with state x and decision u, \/x, u. 

Note that the constraints can be restated as the following 
linear combinations of the variables u> 

(J3 q z an{q) + {\ q - lq )z Zd[q) ) T u> <0, q=l,...,M c (25) 

and collected in the matrix form z u><0 where z is a M c x 
\X X tl\ matrix. Define, with a slight abuse of notation, P as 
a \X\ x \X xU\ matrix, such that the element in the column 
and row corresponding to the pair (x',u) and x is equal to 
1— V(x\x', u) if x—x', and — P(x\x',u) if x^x' . 

The optimization problem can then be formalized as the 
following Linear-fractional Program [19] 



tD = argmin(2; rii u>)/{z rd z) 
s.t. w z < Mc ,i, 



(26) 



1 







x\,i 



u x ,u > 0, \/xGX, uGU 



where l m>n and m „ are mxn matrices whose elements are 
set to one and zero, respectively. 

The equality constraints force z to be an admissible steady- 
state distribution for the transition probabilities of X, and are 
equivalent to 



53 53 u x , u = 1 (27) 
xeX ueu 

53 51 p { x \ x 'i u )u X ',u = 53 ( 28 ) 



xeX ueU 



ueu 



If{w:zw<0,-/u><0, l T w=l, P w=0, z r T A u»Q} 
is a feasible set, then the above problem can be easily 
transformed to the following equivalent linear program via the 



= lim sup— E f 



n->+oo n 



y^z a (X k) Y k ,u k ) = lim sup-yiV li (X k eX <i „Y k ey < j, ) U k eU l j > \x') 

k=l J fc=l 

lim^supi^ ( J2 Vl(X k =x\X =x') V{Y k =y\X k =x,U k =u)V{U k =u\X k =x)\ 
fc=i yey^.uew* J 



(31) 



change of variables n=gu: [19, Ch. 4.2.3]: 
{/?,<?} = arg min z r T k 



(29) 



S.t. Z K. < 



M c ,l> 



ll,\XxU\ 




1 







z 1 


K — 





.9 = 


1 


P 




0|AT|,1 




0|A^|,1 



9>0, >o, yx£X,u£U. 

If 3u £ U : cD Xi „ > 0, then the optimal time-sharing map 
in state x is fl(x,u) = u XtU /^ ueU Q x , u . If X) ueW £> a> „=0, 
i.e., x is transient under the optimal policy, then a deterministic 
action can be chosen at random such that /2(x, u)—l if u=it* G 
and fi(x,u)=Q otherwise. 

V. Renewal Interpretation 

In this section, we discuss the case in which the cost 
functions are used to sample the occurrence of a subset of 
the state-action space. For instance, a cost function indicating 
the occurrence of the first slot of a service interval of a new 
packet can be used to measure the number of packets served by 
a source. The time-average of this cost function corresponds 
to the fraction of slots in which this specific state occurs, i.e., 
its steady-state probability. 

In this case, the ratio of average functions can be interpreted, 
as the average number of occurrences of a subset of the state- 
action space per renewal interval, where the renewal event 
corresponds to the occurrence of another subset of the state- 
action space. 

Define the event <fi as the set X^xy^xU^, where X^,£X, 
y^y and U^CU. We then say that event $ occurs at time t 
if X k £ Xq, Y t £ y^ and u k £ U§. In words, event $ occurs 
at time t if the process X enters the set of states X<j>, the 
value of the random process Y belongs to the subset y$ and 
an action in is selected. The time-average of the sampling 
function 



z<t,(x k ,y k ,Uk) 



1 if {x k ,y k ,u k } e x^xy^xht^ 

otherwise, 

(30) 

measures the average fraction of time in which event <j> occurs. 

Note that, in this case, is a probability measure, which 

we denote by TT^((j>) and that corresponds to the probability 
that <p occurs in a randomly chosen slot (see Eq. (f3TT >. More- 
over, the average time between two consecutive occurrences 



of <p is EJ{r < j ) {i)\=T ( j ) =l/'KA(j)) [18], where r^(£) is the time 
between the £-th and the ^+l-th occurrence of event </>. 

Consider now two events cf> and tp. Assume ~z^,(p)>0, that 
is, the number of times the network hits event tp in an infinite 
sample-path is infinite. The ratio ~z<p(n) /~Zip(ii)=ir fl ((j)) /ir^itp) 
can be used to formulate performance metrics expressed as 
the average number of occurrences of <p per occurrence of 
tp, e.g., average number of transmissions per packet. In other 
words, 20(/x)/z^,(/i)=7r /J (</))/7r At (-0) is the ratio between the 
frequencies of the two events. 

It can be observed that, due to the characterization of X 
and Y, the occurrence of an event tp is a renewal event [18] 
for the network, meaning that the future evolution after an 
occurrence of tp does not depend on the past history of the 
processes. As a consequence, the sample path of X and Y can 
be split into renewal intervals [18] defined by the occurrence 
of tp. The functionals of the states of X and Y computed in 
any renewal interval have the same distribution. 

Define N$ (t) as the process counting the occurrences of tp 
up to time t. The number of occurrences of <p within the ^-th 
renewal interval is denoted with the random variable VAi). 
The cumulative process W${Z) is then defined as the sum 
W^)=ELi^(0- Note that 

lim E IX [W${£)}/1= lim E^N^/t. (32) 

l—¥ + 00 t— > + 00 



The following holds [18]: 

E^(£)]/r^ = 



lim E^w+wye, 

i— >-+oo 



(33) 



i.e., the average occurrences of <p per unit of time in any 
renewal interval is equal to the average number of occurrences 
of <p per unit of time in the whole sample-path of the process. 

We observe that, due to the assumption z^(^)>0, E^V^t)] 
is finite, and the above limit converges. It follows: 



E M l)]=K f J 



lim t _ 



-+oo 



E^{t)]/t 



lim* 



, +00 E^[N^{t))/t 



(34) 



that is, the ratio between the steady-state probabilities of the 
two events is equal to the average number of occurrences of tp 
in a single renewal interval defined by consecutive occurrences 
of tp. 

This observation connects the present work to the frame- 
work presented in [9], [10], in which a linear fractional pro- 
gram is used to minimize the average cost per unit of time of a 
controlled Semi-Markov process. In the framework considered 
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Fig. 1 . Average aggregate energy expense per unit of throughput, average throughput, average energy expense and transmission probability as a function of 
the constraint on the minimum normalized throughput of source 2. 



in [9], [10], to each state of the Markov chain is associated 
an average time interval. The denominator of the objective 
function, then, is used to measure the average amount of time 
the process spends in a state. In the proposed framework, if 
the cost functions are use to sample the occurrence of a subset 
of the state-action space, the reference time is the average 
renewal time, where the renewal intervals are defined by the 
occurrence of the event associated to the denominator. 

VI. Numerical Results 

In this section, we provide numerical results for the frame- 
work presented before. In particular, the optimization problem 
is formalized to minimize the aggregate normalized unit of 
energy spent per unit of throughput achieved in a two-source 
network with constraints on the individual source minimum 
throughput, maximum total delay and maximum packet deliv- 
ery failure probability (including retransmissions). This setting 
is motivated by the considerable interest in energy efficient 
wireless communications of late [20]. As a general observa- 
tion, stringent QoS constraints force the system to move from 
time-splitting to simultaneous transmission scheduling. The 
latter achieves, in the considered setting, a larger throughput 
and allows a faster packet delivery with respect to time- 
splitting. On the other hand, simultaneous transmission is less 
efficient, i.e., requires a larger energy expense per unit of 
throughput. 

A buffer of size B=l is assumed in the first two sets 
of plots in order to investigate the relation between the 



transmission/interference strategy and the service time. Note 
that, in this case, if the state of the individual source s is 
x(s)^0 then b(s)=l. 

Fig. Q] shows the average aggregate energy expense per unit 
of throughput, average throughput, average energy expense 
and transmission probability as a function of the constraint on 
the minimum normalized throughput of source 2 where the 
minimum average normalized throughput of source 1 is fixed 
to 0.35. The packet arrival probabilities are a 1 =a 2 =0.95. 
The maximum service time and the buffer size are F=5 and 
B=l, respectively. The failure probability of a single source 
transmitting alone and with interference from the other source 
are p=0.2 and p*—0A, respectively. The minimum throughput 
of source 1 is fixed to 0.35. The minimum packet delivery 
probability is 0.8. The maximum packet total delay is 3.5slots. 

For the selected parameter setting, the strategy letting only 
one source to transmit at a given time produces a better energy 
over throughput balance with respect to the strategy forcing 
both the sources to transmit. On the other hand, the aggregate 
throughput achieved with the latter strategy is larger than that 
associated with the former strategy. Therefore, as long as the 
throughput requirements are below a certain threshold, the 
controller allows only a single source to transmit in each slot, 
and tunes the fraction of slots assigned to each source in order 
to meet the constraints. It can be observed that, given that only 
a single source transmits in each slot, the average energy spent 
per unit of throughput does not depend on how often a source 
transmits, so that the overall balance remains the same. As 



(a) Transmission probability - source 1 



(b) Transmission probability - source 1 and 2 



Fig. 2. a) transmission probability of source 1 as a function of the state (the transmission probability of source 2 is symmetric), b) probability of simultaneous 
transmission of both the sources as a function of the state. 



soon as the throughput requirement of source 2 goes above 
a certain threshold, the controller is forced to let both the 
sources transmit in a fraction of slots in order to collect a larger 
throughput, thus worsening the energy/throughput balance. It 
can be observed that in the region in which the controller 
schedules simultaneous transmissions, the source with the 
smallest throughput requirement (source 1 in the figures) is 
forced to transmit more often than the other source in slots 
where both the sources transmit, i.e., those slots providing the 
worst energy/throughput balance (Fig. |l(d)| >. In fact, source 

1 spends more energy to collect a unit of throughput with 
respect to source 1 (Fig. |l(a)| i. On the other hand, source 

2 is often scheduled in interference free slots in order to 
collect throughput, and achieves a higher energy efficiency. 
The optimal strategy in this simple configuration suggests 
that channel access protocols should schedule transmission by 
sources with relaxed QoS constraints in slots accessed by other 
sources, while reserving part of the channel resource to sources 
with stringent QoS constraints. Note that idle time, which is 
initially scheduled in order to save energy, vanishes for high 
throughput requirements. 

Fig. |2]plots the transmission probability as a function of the 
state of the network for a similar setting, where the minimum 
throughput requirement is equal to 0.45 for both the sources, 
ai=«2=0.6 and the total delay constraint is 5slots. 

Fig. |2(a)| shows the transmission probability of source 1 as a 
function of the individual state x{l) and a?(2)0 Interestingly, 
transmission probability clusters in the state space. In partic- 
ular, source 1 transmits if /(l)>/(2), i.e., the service time 
of source l's packet is larger than that of source 2's packet. 
This strategy is meant to reduce the probability of packet 
discarding because of maximum service time expiration. The 
region of the state space allocated for Source 2's transmission 
is symmetric to that shown in Fig. |2(a)| The probability of 
simultaneous transmission, shown in Fig. |2(b)| is larger than 
zero on the border-region between the transmission areas of 
the two sources. Other results, not shown here, indicate that 
the area of the state space in which simultaneous transmission 

6 The queue state b(s) is omitted because either equal to if x(s)=0 or 1 
if x(s)>0. 



is scheduled grows as the minimum throughput requirement 
is increased. This result shows how the state space is split to 
achieve the largest energy efficiency with stringent throughput 
requirements. 

Fig. [3] and |4] investigates the optimal strategy in a scenario 
with larger buffer size (£?=3 and F=3). Fig. [3] plots the 
average aggregate energy expense per unit of throughput, 
average throughput, average energy expense and transmission 
probability as a function of the constraint on the maximum 
total delay of source 2's packets. The maximum total delay of 
source l's packets is fixed to 5slots. The minimum throughput 
requirement is equal to 0.3 for both the sources. It can be 
observed that, as the total delay constraint source 2's packets 
is relaxed, source l's throughput, transmission probability and 
energy expense increase, whereas those of source 2 decrease. 
In fact, a stringent delay constraint forces source 2 to transmit 
often in order to deliver packets, whereas source 1 is often 
forced to idleness in order to reduce its impact in terms of 
interference. Due to the constraint on the delivery probability, 
which limits packet discarding, delay and throughput are 
connected. In fact, in order to achieve a smaller delay, sources 
are forced to transmit, thus increasing the throughput. 

Fig. |4(a)| and |4(b)| plot the energy expense per unit of 
throughput in the feasible region. In Fig. |4(a)| the x and y axis 
are the throughput constraint of source 1 and 2, respectively. 
The maximum total delay is 5 slots. In Fig. |4(b)| the x and 
y axis are the throughput and total delay constraint of both 
source 1 and 2. In general, stringent throughput and total 
delay constraints require the system to allocate simultaneous 
transmissions in some regions of the state space. As simulta- 
neous transmission requires a larger energy expense per unit 
of throughput, the efficiency of the network decreases. 

VII. Conclusions 

We present a general framework to find optimal ARQ 
strategies. We model the network as a set of three intertwined 
stochastic processes. The framework extremizes an MDP 
under constraints, using techniques from Linear Fractional 
Programming. Different objectives or different constraints will 
result in different optimal ARQ strategies. 
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Fig. 3. Average aggregate energy expense per unit of throughput, average throughput, average energy expense and transmission probability as a function of 
the constraint on the maximum total delay of source 2's packets 
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(b) Average aggregate energy/throughput 



Here, we consider the objective of minimizing energy ex- 
pense normalized by throughput, under constraints on through- 
put, delay and packet loss. Numerical results obtained solving 
the linear fractional program presented in this work show how 
the system allocates transmissions as a function of the state 
of the network, enlightening interesting system behaviors. 
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